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JPO and NCIPI are not responsible for any 
damages caused by the use of this translation. 

1 .This document has been translated by computer. So the translation may not reflect the original precisely. 

2 **** shows the word which can not be translated. 
3. In the drawings, any words are not translated. 



CLAIMS 



[Claim(s)] 

[Claim 1] The image database which memorizes the image components which consist of a word dictionary which 
memorizes the information for analyzing a language sentence, and attribute value which quantified the description of 
pixel data and said pixel data, An object data control means to memorize the object data which consist of a pointer to the 
image components which express the stereo of the name of said object, the general information about an object, and an 
object about the object in which a stereo exists, The input means which inputs a language sentence, and a language 
analysis means to analyze a language sentence using said word dictionary, to extract the object in which an image 
expression is possible from said analysis result, and to search the object data about said object from an object data control 
means, Have a display means to display the pixel data of image components, and the language sentence inputted with 
said input means is analyzed with said language analysis means. Image retrieval equipment characterized by what the 
pixel data of the image components to which it pointed with the pointer of said object data are distinguished for every 
object, and is displayed with an image display means about the extracted object data. 

[Claim 2] An object discernment means to identify which an object shall express between a background and parts using 
the general information of said object data, Input two or more object data, and the pixel data of the image components to 
which it points with the pointer of object data are distinguished and extracted for every object. It has an image selection 
means to choose only the pixel data to which it points with all object data. About the object which analyzed and extracted 
the language sentence inputted with said input means with said language analysis means When there are two or more 
objects which identify either a background and parts and express a background with an object discernment means Image 
retrieval equipment according to claim 1 characterized by making into the retrieval result of the object showing a 
background the pixel data which did not distinguish for every object about said object, but were chosen from said object 
group with the image selection means. 

[Claim 3] it be image retrieval equipment according to claim 2 characterize by reason discernment of which a 
background or parts shall be equip with the decision rule for undecided which memorize the rule which determine an 
undecided object as either , and the object which extracted the object discernment means with the language analysis 
means if needed shall express between a background and parts using the general information and said decision rule for 
undecided of object data . 

[Claim 4] Image components are image retrieval equipment according to claim 3 characterized by choosing image 
components using the discernment inference result of which said identification information and object shall express 
between a background and parts if needed by having the identification information of which [ of a background and parts ] 
is the object corresponding to the stereo currently expressed by pixel data at least. 

[Claim 5] It is image retrieval equipment according to claim 1 which the attribute value of image components is equipped 
with the name of the object corresponding to the stereo currently expressed by pixel data at least, and is characterized by 
distinguishing attribute value for every stereo currently expressed by pixel data. 

[Claim 6] Object data are image retrieval equipment according to claim 1 characterized by making it possible to move 
retrieval to low-ranking object data further by having a pointer to the object data corresponding to said object, and 
following said pointer in a certain object data, when there is an object which serves as low-ranking relation still more 
notionally. 



[Translation done.] 
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DETAILED DESCRIPTION 



[Detailed Description of the Invention] 
[0001] 

[Industrial Application] In case this invention relates to the equipment with which two or more image components are 
searched using natural language and performs document preparation, such as a pamphlet and a manual, especially using a 
word processor etc., it relates to the equipment with which the image stuck into said document is searched easily. 
[0002] 

[Description of the Prior Art] Since concreteness is high and amount of information also has it compared with language, 
the information which an image has is used in plenty in various documents, such as a pamphlet and a journal. [ much ] 
Furthermore, in current, the word processor with the function which sticks and puts a photograph into a document etc. 
has spread so that such a document can make also at home. When a word processor etc. realizes said function, it is most 
important that a required image can be searched easily. On the other hand, the technique of searching the image which 
considered the language sentence as the input and expressed the content of the input statement is known. For example, 
reference There are [Takahashi, an island, Kishino, "the image database retrieval system using physical relationship", an 
Institute of Electronics, Information and Communication Engineers technical report and PRU 89-80, pp.23-28], etc. This 
technique gives what modeled the content of an expression of an image for the relation between the stereos in an image 
about all the images beforehand accumulated in the database using the hierarchy expression, the E-R model (Entity 
Relation expression), etc. as image retrieval information. Image retrieval by the above-mentioned method is performed as 
the following. The content of an image to search first is described and asked, and a sentence is inputted. Next, the content 
which said input statement expresses in language analysis is changed into the data which can calculate similarity with 
said image retrieval information. Finally similarity of said data and each image retrieval information is calculated, and it 
displays sequentially from a thing with the high assessment value. 
[0003] 

[Problem(s) to be Solved by the Invention] It is necessary to make a model beforehand about all the images accumulated 
by this technique. However, it is difficult to perform said modeling automatically, it must be performed manually, and 
has the trouble that modeling takes immense time and effort. 

[0004] Moreover, although an input statement must be changed into the data which can calculate similarity with image 
retrieval information by said method, if creation of the rule of inference for changing a sentence into said data regulates 
the syntax of an input statement to a difficult thing and reverse according to a rule of inference when not regulating the 
syntax of an input statement, the grammatical degree of freedom of an input statement will become small, and that it is 
hard coming to use a system will pose a problem. 

[0005] Furthermore, since only the image constituted as a scene can be searched, from a viewpoint of the image retrieval 

for the system into which an image is edited, it cannot be said that said retrieval method is suitable combining the image 

showing a background, and the thing started from the usual image as parts which constitute a scene. 

[0006] Therefore, development of the method which searches all images required for description of the content which 

said input statement expresses serves as a technical problem by considering a sentence with a degree of freedom high 

grammatical comparatively as an input. 

[0007] 

[Means for Solving the Problem] In order to solve the above-mentioned technical problem, the image retrieval equipment 
of this invention The image database which memorizes the image components which consist of a word dictionary which 
memorizes the information for analyzing a language sentence, and attribute value which quantified the description of 
pixel data and said pixel data, An object data control means to memorize the object data which consist of a pointer to the 
image components which express the stereo of the name of said object, the general information about an object, and an 
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object about the object in which a stereo exists, The input means which inputs a language sentence, and a language 
analysis means to analyze a language sentence using said word dictionary, to extract the object in which an image 
expression is possible from said analysis result, and to search the object data about said object from an object data control 
means, An object discernment means to identify which an object shall express between a background and parts using the 
general information of said object data, An image selection means to choose only the pixel data to which input two or 
more object data, and distinguish and extract the pixel data of the image components to which it points with the pointer of 
object data for every object, and it points with all object data, It has the configuration which a background or parts 
becomes from the decision rule for undecided which memorizes the rule which determines an undecided object as either, 
and a display means to display the pixel data of image components. 
[0008] 

[Function] If the content of the image to create is inputted with natural language, it will identify whether it is what said 
object data express which of a background and parts to with an object discernment means about the object data searched 
by said language analysis means in the language sentence inputted with said input means. Consequently, about a 
background object, when there are two or more objects showing a background, about said object group, it does not 
distinguish for every object, but let the pixel data chosen from said object group by the image selection means be a 
retrieval result for [ two or more ] a background. When the number of background objects is one, let the pixel data 
searched from the object data corresponding to said background object be a retrieval result. Moreover, about a parts 
object, it distinguishes for every extracted each set elephant, and pixel data are searched. 
[0009] 

[Example] Hereafter, one example of this invention is explained using a drawing. 

[0010] Drawing 1 is the block diagram having shown the example of the image retrieval equipment by this invention. 
Input means which inputs a language sentence The object in which an image expression is possible is extracted from 101 
and a language sentence. A language analysis means to search the object data corresponding to said object 102, an object 
discernment means to identify which an object shall express between a background and parts 103, decision rule for 
undecided 109 and two or more object data are inputted. An image selection means to choose only the pixel data 
containing all the object of them 104, a display means to display pixel data 105 and word dictionary An object data 
control means to memorize 108 and object data Image database which memorizes 107 and image components 106 - 
since - it has the becoming configuration. 

[001 1] Drawing 2 (a) - (f) is an object data control means. 1 07 It is the example which carried out the object data to 
constitute by list form, object data the target name Pointer to the object of 201 and low order General information 
about 202 and an object Pointer to 203 and the image components showing the target stereo 206 from - becoming - this 
example - each CLASS, LOWER_CLASS, COMMONJCN OWLEDGE, and IMAGE_FILE It has distinguished with 
four kinds of labels. 

[0012] CLASS The description section of the name of the object by this invention 201 It is the label which expresses, and 
it is expressed as (a name for CLASS ), and it turns out in the example of drawing 2 (a) that it is description about an 
object "a crest." 

[0013] LOWER_CLASS The description section 202 of the target name by this invention which serves as low-ranking 
relation notionally It is the label which expresses and is expressed as (the object name of LOWER_CLASS low order, or 
its list ). It is described by the example of drawing 2 (a) as a low order object about an object "a crest" that there are "Mt. 
Fuji", the "Alps", etc. What is necessary is just to refer to this part to a certain object, when the object data of that low 
order need to be searched. What expressed the relation of the high order-low order of such an object notionally is shown 
in drawing 3 . 

[0014] In addition, like the object "the Alps" of drawing 2 (b), the concept is already materialized enough and 
somatization expresses that with the approach of not describing these data about a difficult object more. 
[0015] COMMONJCNOWLEDGE The description section of the general information of the object by this invention 203 
It is the label which expresses and is (COMMONJCNOWLEDGE. It is expressed as general information or its list ). It is 
described as general information about an object "a crest" (CLASS_ATTR BG) by this example, and it explains each 
content below by it (ATTR_VAR (HEIGHT PLACE)). 

[0016] CLASS__ATTR Information which identifies whether it is what the object by this invention expresses which of a 
background and parts to 204 It is the label which expresses and they are identified according to the following term. Here, 
it is the example value. BG, PARTS, and NOT_DEF Three kinds are prepared and the semantics of each value is as 
follows. 

[0017] BG: The foreground which exists in the object PARTS background of expressing a background (parts) 
NOTJDEF: that it is decided in a situation that a background or parts will be - it is discriminate that it is the object to 



http://www4.ipdl.ncipi.go.jp/cgi-bin/tran__web_cgi_ejje 



7/21/2005 



JP,G6-1 19405 ,A [DETAILED DESCRIPTION] Page 3 of 6 

• » 

which the "crest" of the example of drawing 2 (a) expresses a background by this, moreover, NOT_DEF "it is an object 
— for example, the following two sentences — setting - the train is running Susono of a crest." 
"The sea can be seen from the aperture of a train. " 

The object which changes a background or parts according to a situation also by the same object name is expressed as the 
object "a train" expresses the appearance of a "train", and the interior of a "train", respectively. When such an object is 
extracted as a result of sentence analysis, it is determined whether the object is a background or they are parts. The 
approach is explained in full detail using an example in explanation of next image retrieval processing. 
[0018] ATTR_VAR Information about the attribute for differentiating the image showing the stereo of the object by this 
invention 205 It is the label which expresses and attributes are enumerated on the following term or list. At the example 
of drawing 2 (a), it is an attribute. HEIGHT (height) and PLACE (location) Two kinds are defined and the concrete value 
about HEIGHT and PLACE can be given to the image which expresses the stereo of the "crest" of this example by this. 
In addition, when this value is used, it is patent application H03-255025. The image which expresses the stereo of a 
"crest" by the technique stated in the number etc. is evaluated, and it also becomes possible to search from what has a 
high assessment value. 

[0019] IMAGE_FILE Pointer to the image components showing the stereo of the object by this invention 206 It is the 
label which expresses and the file names by which the data of said image component were dedicated to the degree as a 
pointer to image components are enumerated. Drawing 4 (b) In an example, it turns out that there are [Jeneval], [Alpsl], 
etc. as a file which stored the image which includes the "Alps" as a stereo. 

[0020] Drawing 4 (a) and (b) Image database 106 It is the example which carried out the image components to constitute 
by list form. 

[0021] Image components are pixel data. Identification information of which [ of a background and parts ] is the object 
corresponding to the stereo currently expressed by 401 and pixel data It consists of attribute value which quantified the 
description of 402 and pixel data. 403 Each It has distinguished with IMAGE, IMAGE_ATTR, and three kinds of labels 
of BELONG_CLASS. 

[0022] IMAGE Although it is a label showing pixel data and the information for usually displaying an image called the 
color information on each pixel etc. is recorded, the following explanation is performed on account of explanation here 
noting that the next term of IMAGE is a pointer to said information. 

[0023] IMAGE_ATTR Identification information of which [ of a background and parts ] is the object by this invention 
402 It is the label which expresses and they are identified according to the following term. Here, it is the value. BG and 
PARTS The semantics which is preparing two kinds and the value gives is the same as that of the case of object data. It is 
discriminable that they are the image components showing a background in the case of the example of drawing 4 (a). 
[0024] BELONG__CLASS Information on the attribute value which quantified the description of the pixel data of the 
image components by this invention 403 It is the label which expresses and distinguishes for every stereo currently 
expressed by pixel data. Therefore, when two or more stereos are described by pixel data, a part for said number is 
described. BELONG_CLASS For the following term, the item which shows by the object name whether it is the 
information on the attribute value about which object, and is further written after it is said object. CLASS__ATTR It is the 
concrete value of the attribute set and defined, drawing 4 (a) ****-- pixel data [Geneva l.img] as the concrete value there 
are the "Alps", "Lake Leman", and "Geneva" as an object contained, for example, concerning the attribute for [ of the 
"Alps" ] a high order "a crest" height (HEIGHT) 3500 It turns out that a meter and a location (PLACE) are 
Switzerland. On the other hand, drawing 4 (b) It is BELONG_CLASS when the stereo currently expressed by pixel data 
like is only one. It is set only to one. 

[0025] Drawing 5 is a decision rule for undecided. 109 It is the example shown by the tabular format. This example 
decides to consider [ whether to set a background as an undecided object, or ] as parts by asking for the number of a 
background, parts, and each undecided object, and making the number correspond to a table, when an undecided object is 
extracted from an input statement. For the notation "-", "**", "O", and "O" in a table, the target number is zero piece, one 
piece, one or more pieces, and two pieces or more, respectively. It expresses. In addition, explanation of a next object 
discernment means describes the determining method for undecided using this rule. 

[0026] It sets to drawing 1 and is a language analysis means. 102 Input means 101 It is a word dictionary about the 
language sentence set and inputted. 108 It uses and analyzes, and it is for extracting the object in which an image 
expression is possible from said input statement, and the example of operation is shown in drawing 6 . 
[0027] First, a sentence is inputted by the input means (601) and said input statement is a word dictionary. 108 It uses and 
is divided into a word by morphological analysis (602). Technique of morphological analysis, word dictionary for it 108 
About construction, it is reference, for example. What is necessary is just to use the approach stated by [Hidaka and 
others, 'basic-morphology [ of 'natural language understanding ]", Information Processing Society of Japan, Vol.30, 
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No. 1 0, pp. 1 169-1 1 75 (1989), etc.]. 

[0028] Next, what expresses the object in which an image expression is possible among the obtained words is extracted 
(603). First, it is an object data control means about each of the obtained word. 107 Name of the object of the object data 
registered 201 It investigates whether there is any match. If there are object data with the name which is in agreement 
with said word, said word shall express the object which can be expressed in an image, and it is an object discernment 
means about said object data. 103 It moves to delivery and the following word and same processing is performed. If there 
is nothing, since the word cannot be expressed in an image, it will perform same processing with the following word. In 
this way, when retrieval processing of object data was completed about all words, it is a language analysis means. 102 
Processing is ended. 

[0029] in addition, by the object data shown in the example of drawing 2 (a) Although "Maung Teng" etc. showing the 
same semantics cannot be employed as a retrieval word since only the "crest" is written as a mnemonic name, this For 
example, the word which expresses more objects with enumerating beforehand the names with which an object is 
expressed as a target name like (CLASS (crest Maung Teng)) of drawing 2 (g) in an input statement can be used. 
[0030] It sets to drawing 1 and is an object discernment means. 103 About discernment of which an object shall express 
between a background and parts, it is the general information of object data. 203 Said decision rule for undecided 109 It 
uses and reasons and the example of operation is shown in drawing 7 . 

[003 1] Since it can limit to either a background or parts, the object of ****** j s fa e general information of each set 
elephant data about either beforehand. 203 It describes and identifies with reference to said general information, the 
object data shown in the example of drawjng_2 ~ general information - setting identification information or 
(CLASS_ATTR BG) it is written as (CLASS_ATTR PARTS) and, thereby, can identify, however, CLASS_ATTR of 
object data **** since one of a background and the parts is influenced very much by the situation about the object of a 
decimal as explained, about discernment of the background of said object, or parts, inference which used said situation is 
required. It is an object discernment means about the object data which analyzed the input statement as said situation and 
were obtained by this example. 103 How undecided object data identify a background or parts is explained using the 
result and the decision rule for undecided of drawing 5 which were identified. 

[0032] First, language analysis means 102 The extracted object data are classified into a background, parts, and 
undecided either, and it asks for the number of the object data belonging to each (701,702), and investigates whether an 
undecided object exists (703). When an undecided object exists, they determine a background or parts as follows (704). 
the number of each set elephant 2nd column - of the table of drawing 5 -- comparing with the 4th column - the type of 
a sentence analysis result - column of the beginning of the table of drawing 5 A-F from -- it is found. And let the data 
currently described in the column of the type of the last be a discernment result for undecided. In addition, in the table of 
drawing 5 of this example, Types B and D mean that it gives priority to and searches the image components as parts in 
case this searches image components actually, although the discernment result of an undecided class is "" (parts) (bundled 
with the parenthesis). 

[0033] It sets to drawing 1 and is an image selection means. 104 Two or more object data are inputted, the pixel data of 
the image components to which it points with the pointer of object data are distinguished and extracted for every object, 
only the pixel data to which it points with all object data are chosen, and the example of operation is shown in drawing 8 . 

[0034] First, after classifying into an object data background and parts (801), it moves to the beginning at retrieval of a 
background image (802). In retrieval (802) of a background image, the number for a background is investigated first 
(803). When a background object is two or more pieces, the image components containing object data are extracted by 
following the pointer of said object data, and the list which stored the pixel data described by each image component is 
created for every object data (804). Next, it chooses suitably [ any one inputted object data ], and the following activities 
are done (805). 

[0035] 1 .) Pick out 1 pixel data from said list to selected object data. 

[0036] 2.) When the pixel data is stored in all other lists, leave the pixel data, and delete the other pixel data. 

[0037] 3.) If other pixel data remain in said list, it will describe above. Processing of 2. is continued. 

[0038] At the end, it processes. 805 The pixel data which remained in the list to selected object data are outputted, and it 

is an image selection means. 104 Processing is ended. 

[0039] When a background object is only one, the image components containing object data shall be extracted by 
following the pointer of said object data, and the pixel data described by each image component shall be outputted (806). 
[0040] Retrieval (807) of a parts image extracts the image components containing object data by following the pointer of 
said object data like the case where a background image is only one, outputs the pixel data described by each image 
component, and performs this to each parts image (808). 
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[0041] In addition, image selection means by this example 104 It is guaranteed, although a retrieval result is not affected 
at all even if it replaces the procedure of retrieval of this background and parts, although the pixel data of each parts are 
searched after searching the pixel data of a background then. 

[0042] Above, each means to constitute this system, and explanation of each DS are ended, and actuation of this retrieval 
method is explained with reference to drawing 1 - drawing 8 below. In order to explain actuation concretely "It is to the 
sky in beautiful Geneva of a crest and a lake with a HEL". The flow of processing at the time of inputting (1) as an 
example is shown., 

1 . ) Input means 101 An example (1) is inputted. It is changed into character-string data and said example is a language 
analysis means. 102 It is sent. 

2. ) Language analysis means 102 An example (1) is first divided as follows into a word by morphological analysis. 
[0043] 

HEL Crest Lake It is beautiful. Geneva Sky It is an object data control means about (2), next each of a word which was 
obtained. 107 It investigates whether it is in agreement with the name of the object of all the object data registered. It is 
(2) supposing the thing of each node of the tree structure of the object data shown in drawing 3 as object data is prepared. 
As what has object data among word groups, since four words, a "HEL", a "crest", a "lake", and "Geneva", are found, the 
object data corresponding to said four words are extracted. 

3. ) Object discernment means 103 The general information and said decision rule for undecided of object data [ as 
opposed to each set elephant in which / of a background and parts / are an object "a HEL 11 , a "crest", a "lake", and 
"Geneva" ] 109 It uses and identifies. If the example of drawing 2 (a) - (f) is used as object data, said each set elephant 
will be identified as follows. 

[0044] Background: "a crest", a "lake", "Geneva" 
Undecidedness : "a HEL" 

Since the undecided object "a HEL" was found here, it is the decision rule for undecided of drawing 5 about a 
background or parts next. 109 It uses and determines. Language analysis means 102 A processing result shows that a 
background, parts, and the undecided number are 3, 0, and 1, respectively, if the type of the sentence analysis result 
corresponding to this number is looked for from the table of drawing 5 E it is — since - further - type E When the last 
column is investigated, it turns out that undecided objects "a HEL" are parts. 

4. ) Object discernment means 103 About the object data which a background or parts decided, it is an image selection 
means. 104 Image components are searched. Each set elephant data are already background: "a crest", a "lake", and 
"Geneva." 

Parts : "a HEL" 

Since it has decided, this is divided into a background and parts and is searched, in order to search the image containing 
all of a "crest", a "lake", and "Geneva" about a background image -- image selection means 104 The object data used as 
an input are set to three, a "crest", a "lake", and "Geneva." 

[0045] First, the image components which express each about said object are referred to, and the list which stored the 
pixel data searched from said image component is created. The method of creating the list about object data "a crest" is 
explained succeedingly, using the example of drawing 2 (a) as object data. Image components are not registered into said 
object data although pixel data including a "crest" are searched from the image components belonging to object data "a 
crest." Then, LOWER_CLASS With reference to the image components of low-ranking object data "the Alps and Mt. 
Fuji pixel data are further searched from said image component. The list containing all the pixel data that express the 
object data belonging to the low order of object data "a crest" eventually is created. Similarly, the result of having created 
the list also about a "lake" and "Geneva" is as follows. 

[0046] Crest: Geneval.img and Alpsl.img ... Lake : Geneval.img and Lake-Leman.img ... Geneva : Only Geneval.img 
and Geneva2.img..., next the pixel data to which it points with all object data are chosen, if a "crest" is chosen as one of 
the inputted object data - Geneval.img although it leaves since it is contained in all other lists — Alpsl .img Since it is 
not contained, it deletes, in this way - as the pixel data containing all object data Geneva 1 etc. ~ it is extracted and let 
them be the retrieval results of a background image. 

[0047] A parts image is searched separately. Here, since parts are only "HELs", with reference to the image components 
belonging to object data "a HEL", helicopterl and helicopter2... are searched as pixel data. 

5. ) Display means 105 The pixel data by which retrieval was carried out [ above-mentioned ] are divided and displayed 
on a background and parts, and retrieval processing is ended. 

[0048] 

[Effect of the Invention] In the image retrieval system by this invention, the image is beforehand classified into a 
background and parts. And using a natural language sentence as an input of the retrieval inquiry by this invention, said 
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input statememt is analyzed, with the method which searches automatically a background image and each parts image 
which constitutes the foreground, it cannot leak and the image as a component of the scene which said input statement 
expresses can be searched. 

[0049] Moreover, it becomes possible by arranging a parts image on a background image to compound many scenes even 
from a small-scale image database in said system by building the image retrieval equipment by this invention into the 
system which carries out edit composition of the natural drawing. 

[0050] Furthermore, since it has the rule which determines it from said situation also about the object it is decided in a 
situation that it will be which [ of a background and parts ] it is, the image related by the content of the input statement 
more strongly can be searched. 



[Translation done.] 
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